Skip to content

[RFC] commit-reach: skip STALE drain when only one merge-base needed#2109

Open
spkrka wants to merge 1 commit intogitgitgadget:masterfrom
spkrka:merge-base-early-exit
Open

[RFC] commit-reach: skip STALE drain when only one merge-base needed#2109
spkrka wants to merge 1 commit intogitgitgadget:masterfrom
spkrka:merge-base-early-exit

Conversation

@spkrka
Copy link
Copy Markdown

@spkrka spkrka commented May 8, 2026

Context for what this is all about.

I am working with a very large git monorepo and have been investigating performance issues. After some digging I ended up looking more deeply into git merge-base. I saw it had an --all parameter but the default is to only return a single merge-base. Looking through the code and adding debug timing, I realized that although the total time to compute the merge-base was high, a very small amount of time was spent finding the initial merge-base value that was later returned.

The optimization is actually quite dramatic in a large repo - runtime went down from 5000ms to 50ms, so it's roughly a 100x optimization. This comes from an exploding frontier of STALE commits to drain.

Thus, my idea is simply to return early from the function once we know what will be returned. This only works if we find a candidate that we know will not be pruned later - but fortunately if we have a commit graph with generations we will visit commits in order such that it will actually not be pruned.

CC: Derrick Stolee stolee@gmail.com

Changes since v1 (thanks Junio for the review):

  • Dropped the has_gens variable entirely. If a commit has a
    finite generation then it is in the commit-graph, and so are
    all its ancestors — no additional check is needed to know
    the queue ordering is sound. Without a commit-graph every
    commit gets INFINITY and the guard never fires. This also
    avoids the misleading interaction with callers that pass
    non-zero min_generation without having generation data.

  • Simplified the early exit guard from three conditions to two:
    !find_all && generation < GENERATION_NUMBER_INFINITY.

  • Fixed multi-line comment style per CodingGuidelines.

  • Replaced "dominate" with concrete reasoning about queue ordering.

  • Did not extract a helper function: after the simplifications above
    the inner block is four lines and reads naturally inline. The
    right boundary for a helper is not obvious (it could absorb just
    the result marking, or also the RESULT flag check, or also the
    PARENT1|PARENT2 test) and each level requires more local state
    passed by pointer. Happy to extract one if preferred.

Changes since v2 (thanks Patrick for the suggestion):

  • Replaced the boolean find_all and ignore_missing_commits
    parameters in paint_down_to_common() with a single
    enum merge_base_flags mb_flags, reducing the function from
    8 to 7 parameters. The enum is defined in commit-reach.h
    with MERGE_BASE_FIND_ALL and MERGE_BASE_IGNORE_MISSING_COMMITS.

  • Named the enum merge_base_flags rather than
    paint_down_to_common_flags since the flags express caller
    intent and are threaded through multiple layers including the
    public repo_get_merge_bases_many_dirty() API.

  • Used mb_flags as the parameter name to avoid shadowing the
    existing local int flags (commit object flags) inside
    paint_down_to_common().

cc: Patrick Steinhardt ps@pks.im

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 8, 2026

Welcome to GitGitGadget

Hi @spkrka, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests.

Please make sure that either:

  • Your Pull Request has a good description, if it consists of multiple commits, as it will be used as cover letter.
  • Your Pull Request description is empty, if it consists of a single commit, as the commit message should be descriptive enough by itself.

You can CC potential reviewers by adding a footer to the PR description with the following syntax:

CC: Revi Ewer <revi.ewer@example.com>, Ill Takalook <ill.takalook@example.net>

NOTE: DO NOT copy/paste your CC list from a previous GGG PR's description,
because it will result in a malformed CC list on the mailing list. See
example.

Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:

  • the lines should not exceed 76 columns,
  • the first line should be like a header and typically start with a prefix like "tests:" or "revisions:" to state which subsystem the change is about, and
  • the commit messages' body should be describing the "why?" of the change.
  • Finally, the commit messages should end in a Signed-off-by: line matching the commits' author.

It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code.

Contributing the patches

Before you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form /allow. A good way to find other contributors is to locate recent pull requests where someone has been /allowed:

Both the person who commented /allow and the PR author are able to /allow you.

An alternative is the channel #git-devel on the Libera Chat IRC network:

<newcontributor> I've just created my first PR, could someone please /allow me? https://github.com/gitgitgadget/git/pull/12345
<veteran> newcontributor: it is done
<newcontributor> thanks!

Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment /submit.

If you want to see what email(s) would be sent for a /submit request, add a PR comment /preview to have the email(s) sent to you. You must have a public GitHub email address for this. Note that any reviewers CC'd via the list in the PR description will not actually be sent emails.

After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail).

If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the (raw) link), then import it into your mail program. If you use GMail, you can do this via:

curl -g --user "<EMailAddress>:<Password>" \
    --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt

To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):

Changes since v1:
- Fixed a typo in the commit message (found by ...)
- Added a code comment to ... as suggested by ...
...

To send a new iteration, just add another PR comment with the contents: /submit.

Need help?

New contributors who want advice are encouraged to join git-mentoring@googlegroups.com, where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join.

You may also be able to find help in real time in the developer IRC channel, #git-devel on Libera Chat. Remember that IRC does not support offline messaging, so if you send someone a private message and log out, they cannot respond to you. The scrollback of #git-devel is archived, though.

@gitgitgadget gitgitgadget Bot added the new user label May 8, 2026
@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 8, 2026

There is an issue in commit a43f270:
commit-reach: skip STALE drain in paint_down_to_common when only one merge-base needed

  • First line of commit message is too long (> 76 columns)

@spkrka spkrka force-pushed the merge-base-early-exit branch from a43f270 to 09c61b0 Compare May 8, 2026 13:53
@spkrka spkrka changed the title commit-reach: skip STALE drain when only one merge-base needed [RFC] commit-reach: skip STALE drain when only one merge-base needed May 8, 2026
@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 8, 2026

There is an issue in commit 09c61b0:
commit-reach: skip STALE drain in paint_down_to_common when only one merge-base needed

  • First line of commit message is too long (> 76 columns)

@spkrka spkrka force-pushed the merge-base-early-exit branch from 09c61b0 to 50f4c7d Compare May 8, 2026 13:55
Copy link
Copy Markdown

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta comments:

  • You can use your PR description to introduce yourself as a new contributor. It will not impact the commit message that is stored in the final repository, but helps folks get to know you.
  • I like your custom tests. I'd be interested to know if any performance tests in t/perf/ would help. You can cd t/perf/ && ./run HEAD~1 HEAD -- p<num>-<name>.sh. However: I checked the existing tests and I don't think that any will actually show meaningful results in the way your manual test does.

Comment thread t/t6010-merge-base.sh
Comment thread commit-reach.c
@derrickstolee
Copy link
Copy Markdown

/allow

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 8, 2026

User spkrka is now allowed to use GitGitGadget.

WARNING: spkrka has no public email address set on GitHub; GitGitGadget needs an email address to Cc: you on your contribution, so that you receive any feedback on the Git mailing list. Go to https://github.com/settings/profile to make your preferred email public to let GitGitGadget know which email address to use.

@spkrka spkrka force-pushed the merge-base-early-exit branch from 50f4c7d to 54cdf9b Compare May 8, 2026 14:46
@spkrka
Copy link
Copy Markdown
Author

spkrka commented May 8, 2026

/preview

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 8, 2026

Preview email sent as pull.2109.git.1778252509243.gitgitgadget@gmail.com

@spkrka
Copy link
Copy Markdown
Author

spkrka commented May 8, 2026

/submit

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 8, 2026

Submitted as pull.2109.git.1778252837132.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2109/spkrka/merge-base-early-exit-v1

To fetch this version to local tag pr-2109/spkrka/merge-base-early-exit-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2109/spkrka/merge-base-early-exit-v1

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 9, 2026

This patch series was integrated into seen via git@0b5178e.

@gitgitgadget gitgitgadget Bot added the seen label May 9, 2026
@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 9, 2026

This patch series was integrated into seen via git@b7cd8e6.

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 11, 2026

This branch is now known as kk/paint-down-to-common-optim.

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 11, 2026

This patch series was integrated into seen via git@f055420.

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 11, 2026

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> diff --git a/commit-reach.c b/commit-reach.c
> index d3a9b3ed6f..c9d2d594de 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -55,14 +55,16 @@ static int paint_down_to_common(struct repository *r,
>  				struct commit **twos,
>  				timestamp_t min_generation,
>  				int ignore_missing_commits,
> +				int find_all,
>  				struct commit_list **result)
>  {
>  	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
>  	int i;
> +	int has_gens = min_generation || corrected_commit_dates_enabled(r);

Giving a name that identifies what the commonly-used logical
expression means is a very good idea, but don't some caller pass
min_generation==infinity (i.e., not zero) when there is no
generation available?  E.g., remove_redundant_no_gen() passes
the result of calling commit_graph_generation(), and without commit
graph, we would get infinity here, right?

Given that the second user of this variable is guarded by !find_all,
the variable being true even when we shouldn't be using generation
numbers may not matter for the purpose of the early break, but then
it means that the patch squanders the perfect opportunity to clarify
what the variable means, which is not very lovely.

>  	timestamp_t last_gen = GENERATION_NUMBER_INFINITY;
>  	struct commit_list **tail = result;
>  
> -	if (!min_generation && !corrected_commit_dates_enabled(r))
> +	if (!has_gens)
>  		queue.compare = compare_commits_by_commit_date;
>  
>  	one->object.flags |= PARENT1;
> @@ -97,6 +99,11 @@ static int paint_down_to_common(struct repository *r,
>  			if (!(commit->object.flags & RESULT)) {
>  				commit->object.flags |= RESULT;
>  				tail = commit_list_append(commit, tail);
> +				/* Generation-ordered queue: no later
> +				 * commit can dominate this one. */
> +				if (!find_all && has_gens &&
> +				    generation < GENERATION_NUMBER_INFINITY)
> +					break;

Three comments

 * See Documentation/CodingGuidelines for our preferred style for
   multi-line comments.

 * I do not think we often use "dominate" to describe relationship
   between commits, and I am not sure what you want the word to mean
   in this context.  Can you clarify?

 * The code is getting way too deeply indented.  Perhaps using a
   helper function and


                        if (!(commit->object.flags & RESULT)) {
                                if (mark_result(r, &tail, commit,
						find_all, min_generation))
                                        break;
                        }

   move the logic to mark the newly discovered commit as one of the
   possible result, and to tell the loop to terminate, to it, along
   with the comment on the meaning of its return value, may make the
   code easier to follow?

> +/* To be used only when object flags after this call no longer matter.
> + * When find_all is false and generation numbers are available, returns
> + * after finding the first merge-base, skipping the STALE drain. */

Ditto on the style.

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 11, 2026

There was a status update in the "New Topics" section about the branch kk/paint-down-to-common-optim on the Git mailing list:

"git merge-base" optimization.

Comments?
source: <pull.2109.git.1778252837132.gitgitgadget@gmail.com>

@spkrka spkrka force-pushed the merge-base-early-exit branch from 54cdf9b to 43c6bdf Compare May 11, 2026 05:29
@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 11, 2026

This patch series was integrated into seen via git@41a7672.

@spkrka spkrka force-pushed the merge-base-early-exit branch from 43c6bdf to f7b5c26 Compare May 11, 2026 05:51
@spkrka
Copy link
Copy Markdown
Author

spkrka commented May 11, 2026

/submit

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 11, 2026

Submitted as pull.2109.v2.git.1778480348118.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2109/spkrka/merge-base-early-exit-v2

To fetch this version to local tag pr-2109/spkrka/merge-base-early-exit-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2109/spkrka/merge-base-early-exit-v2

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 11, 2026

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Mon, May 11, 2026 at 06:19:08AM +0000, Kristofer Karlsson via GitGitGadget wrote:
> diff --git a/commit-reach.c b/commit-reach.c
> index d3a9b3ed6f..b4ca00bb7e 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -165,7 +175,7 @@ static int merge_bases_many(struct repository *r,
>  				     oid_to_hex(&twos[i]->object.oid));
>  	}
>  
> -	if (paint_down_to_common(r, one, n, twos, 0, 0, &list)) {
> +	if (paint_down_to_common(r, one, n, twos, 0, 0, find_all, &list)) {
>  		commit_list_free(list);
>  		return -1;
>  	}

Callsites like this are quite hard to read now with these boolean flags.
Would it be preferable to instead use flags?

    enum paint_down_to_common_flags {
        PAINT_DOWN_TO_COMMON_IGNORE_MISSING_COMMITS = (1 << 0),
        PAINT_DOWN_TO_COMMON_FIND_ALL = (1 << 1),
    };

It's more verbose of course, but that's kind of the point.

Only weirdness is that we don't only accept these flags in
`paint_down_to_common()`, but also in other functions that pass those
flags down.

Patrick

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 11, 2026

User Patrick Steinhardt <ps@pks.im> has been added to the cc: list.

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 11, 2026

This patch series was integrated into seen via git@dabfe27.

Commits not in the commit-graph get GENERATION_NUMBER_INFINITY and
sort to the top of the priority queue.  After those, commits with
finite generation numbers are popped in non-increasing order.
When MERGE_BASE_FIND_ALL is not set the first doubly-painted commit
with a finite generation is therefore a best merge-base: no commit
still in the queue can be a descendant of it.  Skip the expensive
STALE drain in this case.

Introduce enum merge_base_flags with MERGE_BASE_FIND_ALL and
MERGE_BASE_IGNORE_MISSING_COMMITS, replacing the two boolean
parameters in paint_down_to_common().  Thread the flags through
merge_bases_many(), get_merge_bases_many_0(), and the public
repo_get_merge_bases_many_dirty() API.  git merge-base (without
--all) passes 0, triggering the early exit.

On a 2.2M-commit merge-heavy monorepo with commit-graph:

  HEAD vs ~500:   5,229ms -> 24ms
  HEAD vs ~1000:  4,214ms -> 39ms
  HEAD vs ~5000:  3,799ms -> 46ms
  HEAD vs ~10000: 3,827ms -> 61ms

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
@spkrka spkrka force-pushed the merge-base-early-exit branch from f7b5c26 to e4dada8 Compare May 11, 2026 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants